A generalized spatiotemporal covariance model 
for stationary background in analysis of MEG data 
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Abstract — Using a noise covariance model based on a single 
Kronecker product of spatial and temporal covariance in the 
spatiotemporal analysis of MEG data was demonstrated to 
provide improvement in the results over that of the commonly 
used diagonal noise covariance model. In this paper we present 
a model that is a generalization of all of the above models. It 
describes models based on a single Kronecker product of spatial 
and temporal covariance as well as more complicated multi- 
pair models together with any intermediate form expressed as 
a sum of Kronecker products of spatial component matrices 
of reduced rank and their corresponding temporal covariance 
matrices. The model provides a framework for controlling the 
tradeoff between the described complexity of the background 
and computational demand for the analysis using this model. 
Ways to estimate the value of the parameter controlling this 
tradeoff are also discussed. 

I. Introduction 

Given that MEG/EEG is measured in M trials, on L 
sensors and in C time samples, let E m be the L by C single 
trial noise matrix at trial m. And to simplify our notation 
further we assume that the background in E m is zero-mean. 
In this case, the sample spatiotemporal covariance matrix of 
dimension N = LC for the averaged over trials noise is 



a = 



M 



1 M 

— ^ « ec ( E m)wec(E m ) T , 



(1) 



where wec(E) is all the columns of E stacked in a vector. 

Due to the tremendously large number of parameters 
(0(L 2 C 2 )) compared to the relative sparsity of background 
data typically available many attempts to model the co- 
variance has been made. In order to be useful for source 
analysis a model of spatiotemporal covariance should capture 
the structure of the background with as few parameters as 
possible. Finally to be useful the model should be invertible, 
so it can be used in a likelihood formulation. 

To capture the structure in the brain background signal 
we note that there is evidence that the background activity 
of the brain has a stationary spatial distribution over the 
time of interest in the neuroscientific experiments [6]. This 
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observation provides for the current approach in exploratory 
fMRI analysis to represent four dimensional spatiotemporal 
data as a two dimensional matrix X, which is decomposed 
into a sum of Kronecker products contaminated by normal 
noise [1]: 



X = a r ® b r + E. 



(2) 



Relying on such evidence, we can make assumptions 
about the background activity in the brain in the MEG 
(magnetoencephalography) measurements: 

1) The measured background is a superposition of R 
spatially fixed sources with independent temporal be- 
havior. 

2) Each spatial source produces correlated Gaussian 
noise. 

These are the basis for modeling spatiotemporal covari- 
ance adopted in this work. The final goal is to find spatial 
components according to a justified criteria and then estimate 
respective temporal covariances for each component. For L 
sensors performing the measurement we can extract at most 
L spatial components from the measured background, which 
gives us the starting form for the spatiotemporal covariance 
matrix: 



(3) 



where Si = vivf is a spatial component represented as a 
singular matrix of rank 1. 

In what follows we demonstrate how existing spatiotem- 
poral covariance models based on the Kronecker product can 
be described in the proposed framework. This includes the 
models based on one Kronecker product suggested in [2], 
[3] and on a series of Kronecker products introduced by 
us [4]. Finally we extend the framework by making it span 
the range of models between one pair Kronecker models and 
L component models. 

II. One-pair Kronecker product 

In [2] was proposed a model that casts spatiotemporal 
covariance in the form that uses the Kronecker product: 



cov 



(4) 



Temporal covariance is a C x C matrix T and spatial 
covariance is a L x L matrix S, where C is the number 
of time samples and L is the number of sensors. 

An important feature of this model is its computationally 
fast inverse calculation (T®S) _1 = T _1 <g>S _1 . This feature 



is required of all covariance models to make them useful in 
the analysis. All models presented in this paper are invertible 
and manageable within reasonable time. 

Since the background is assumed to have a Gaussian 
distribution the model can be estimated in the maximum 
likelihood framework as in [2]. 

S = ^(sE^). (5) 

\ m=l / 
\ m=l / 

Here M is the number of available L x C spatiotemporal 
measurements E m . 

In essence, the main assumption of the one-pair Kronecker 
product model is that all spatial components of the back- 
ground have one and the same temporal covariance structure. 
To demonstrate the validity of this statement we perform 
spectral decomposition of the spatial covariance: 

L 

T^^aiSu (6) 

1=1 

where Si = vivf is an orthonormal basis component repre- 
sented as a singular matrix. Using the identity A<g)(B+C) = 
A (g) B + A <gi C, the expression (J6J can be represented as: 

L L 

^2T®aiSi = ^cr/T®5 ; . (7) 

i=i 1=1 
The Left Hand Side of the equation makes it obvious that 
each spatial component has the same temporal covariance. 
The contribution of each temporal covariance is weighted 
by the variance of the corresponding orthonormal spatial 
component, as seen from the Right Hand Side of @. When 
we set (T/T = T/ we see how our general model from 
subsumes the one pair model. 

A parameterized model of the same form as in @ was 
introduced in order to achieve a further reduction in the 
number of parameters [3]. This model is also a special case 
of 0. To see that it is enough to follow the same logic as 
we just did for the unparameterized model. 

III. Multi-pair models 

In a more realistic case it is easy to picture a situation 
where several noise generators belonging to separate spatial 
components also have different temporal structures. 

Two models covering such case were introduces in [5] 
one in which spatial components Si are components of an 
orthonormal basis and the other with spatial components 
from an independent basis. The following is assumed: 

1) Spatiotemporal noise is generated by L spatially or- 
thogonal (or independent) generators which do not 
change their location during the period of interest. 

2) Each spatial component is uncorrelated with other 
components. 

3) Measured signal has a Gaussian distribution with zero 
mean. 



Both models have the form as in and any set of 
orthogonal or independent components can be used. As 
demonstrated in [5] estimation of orthogonal components 
through singular value decomposition and of independent 
components through an independent component analysis 
algorithm work well. 

For given orthogonal components Si = vivf with vi being 
a column of an orthogonal matrix V, Maximal Likelihood 
estimate of the model is: 

L 

COV « COV = ^Ti®Si, (8) 

1=1 

1 M 

m— 1 

For given independent components IZi = wiwj with wi 
being a row of a full rank matrix W^ 1 , Maximal Likelihood 
estimate of the model is: 

L 

COV«COV = ^T ; ®-R ; , (9) 

1=1 
1 M 

?~i = ± J2 Ef n WW T ^WW T E m . 

m— 1 

These models are more general than the model of Sec- 
tion [H] and the independent basis multi-pair model of l|9} 
is the most general in terms of having the largest number 
of degrees of freedom. This increased complexity brings 
advantages in being able to more accurately model the 
complex empirical noise covariance, which lead to improved 
source localization performance as demonstrated in [5]. 
Furthermore, inversion is computationally efficient due to the 
built in structure. For the orthogonal basis model: 

1 L 

COV - ^(Tzr 1 ®^. (10) 

And for the independent basis model: 

, 1 L 

COV =^(T i )- 1 ®[WW T 7l i WW T ]. (11) 

1=1 

Details of the derivation can be found in [5]. 

IV. Generalized multi-pair model 

Even though the models in Section[ni]are more descriptive 
and yield better results than single-pair models, the problem 
with them is the computational complexity of each iteration 
in the analysis where the likelihood needs to be computed. 
The negative log-likelihood function in the case of using a 
multipair model looks like: 

L 

Y,(E b - E M ) T Si{E b - Em)^ 1 . (12) 

1=1 

Here E\, and Em are L x C matrices of spatiotemporal 
measurements and the current prediction made by the model, 
respectively. Calculation of fl2J takes 0(L(L 2 C + LC 2 + 



C 3 )) operations and it is L times bigger than for one -pair 
models. 

Yet, notice that multi-pair models utilize all spatial com- 
ponents which in the case of the orthogonal basis model 
also includes those components that have small singular 
values. Without loss of generality we consider first only the 
orthogonal basis multi-pair model. Separating the first r < L 
most significant singular values, the model can be expressed 
as 



E 



r. 



L 



k=r+l 



(13) 



The second term in expression Jl 3i contains small vari- 
ance relative to the first one. Conventional dimensionality 
reduction techniques like PCA would eliminate this term 
from further consideration. Though this would make the 
model smaller in terms of storage requirements it will at the 
same time render it useless in the analysis. The inversion 
following the lines of J10I would become impossible. At 
the same time if we set temporal covariance of all spatial 
components with small singular values to be the same as it 
is assumed in the one-pair model @ we obtain the following 
expression: 



E 



L 



fe=r+l 

Expression dl4> is a general form describing spatiotem- 
poral covariance models based on the assumption that the 
brain has stationary spatial distribution of the background 
activity over the time of interest [6]. It can describe the one- 
pair model when r is set to zero and multi-pair models when 
r = L. 

A feature absolutely required from this model is that 
it be invertible. Otherwise, the model cannot be used in 
the Likelihood function (I12> for source localization. Indeed, 
this model has a computationally manageable inverse with 
calculation time depending on r and ranging between the 
times for the one-pair model and the multi-pair models. For 
the orthogonal basis multi-pair model the inverse is: 



^T- 1 t&Si + T- 1 ® S k 



(15) 



k=r+l 



To prove this claim it suffices to show that: 

(^T,- 1 ®5 / + T- 1 ® £ S fc j (16) 



«J=1 



k=r+l 
L 



x E Tj ® 5 ' +T ® E Sk ) =I - 

\i=l k=r+l / 

To proceed we need the following properties of Sf. 
• property (1) 

(Si) 2 = (vivf)(vivf) =vi(vfvi)vf 

= vivf {vfvi = 1 by orthogonality) 
= Si. 



. property (2) : I = V V T = v lV f . 
• property (3) : SiSf = 6(1,1 )Si, where 6(1,1 ) is the 
Kronecker delta. 

Using these properties and performing the multiplication in 
the LHS of ([16}: 



J ® Si + i® J2 Sk (17) 

1=1 k=r+l 
r L 

= I ®^Si + I ® Y S k 

1 = 1 k=r + l 

L 

= I®^Si 

1=1 

and using property (2) 
= I® 1 = 1. 

For the independent basis multi-pair model if we replace 
orthogonal components Si by independent components TZi 
the inverse is: 



^ T r J ® WW T TZiWW q 



(18) 



i=i 



+T- 1 ® ww T n k ww T . 

k=r+l 

The proof proceeds in the similar fashion as for the orthog- 
onal basis model. 

Computational efficiency of this new generalized spa- 
tiotemporal covariance model is now a function of r and 
not fixed to the number of sensors, as in the case of models 
from Section ||1T| Here again, without loss of generality 
we demonstrate results using the orthogonal basis model. 
We redefine ^2 k =r+i ^ k = ^ e negative log-likelihood 
function with the use of generalized covariance is then 
expressed as: 

r 

^(Eb-EMfS^Eb-EM)^ 1 (19) 

i=i 

+ (Eb — Em) T S(Eb — Em)T ■ 

The computational complexity of this calculation is 0((r + 
1)(L 2 C+LC 2 +C 3 )) and is flexible since it is parameterized 
by r. It can be as large as the one of H2\ when the back- 
ground is diverse enough to need a complex representation 
and can be as small as that of the one-pair model. 

The generalized orthogonal basis model from (114-i can 
be estimated in a relatively straightforward manner based 
on the nature of PCA. The number of single Kronecker 
product pairs of our model r + 1 is obtained by choosing 
the most significant components provided by singular value 
decomposition. This is a well justified approach and it fol- 
lows the conventional method of dimensionality reduction in 
PCA. After that the model can be estimated in the Maximal 
Likelihood framework. Before proceeding with its derivation 
we need to calculate the determinant of dl4> . It can be found 



from the determinant of the orthogonal basis multipair model 
derived in [5]: 



(20) 



Noticing that the last J = L — r temporal covariances in the 
product are the same, and are T", we get: 



(21) 



\Y,T l ®S l + T®S\ =T J Y[\Ti\. 

i=i i 

The log-likelihood function L is then written as: 
const -M- (jln\T\ + f2ln\Ti\j 

Differentiating with respect to T results in: 

dC = ^ltr{{T)-HT) 

\tr ( E^5E m (T)- 1 dT(T)- 1 j 

\m=l / 



(22) 



(23) 



2 

MJ 



tr((T)-'dT) 



\tr(^{T)-^lSV m (T)-UT\ 

\rn=l ) 



MJ 



tr 



And the final result is: 



(24) 



Together with the equation (|8) we can estimate the gen- 
eralized model as a sum of r Kronecker products of rank 
one spatial component matrices with their corresponding 
temporal covariance matrices plus a Kronecker product of 
a spatial component matrix of rank J with its corresponding 
temporal covariance matrix. 

This approach will not work for the independent basis 
model. There are no singular values that can be used for 



thresholding. However, we can use the variance of the 
data projected into each independent spatial component to 
perform thresholding based on its value. Then a similar 
Maximal Likelihood estimation as in l|9} can be used for 
the generalized independent basis model. This would add 
asymmetry to the way these two models are estimated. 

Another possible approach can be based on observations 
made in [4]. Many temporal covariances of different spatial 
components were found to be similar. By discovering r 
clusters of similar temporal covariances we can obtain r 
Kronecker products of the generalized multi-pair model. This 
approach is applicable both for orthogonal and independent 
basis models. A similarity criterion could be developed for 
that case, such that it provides clustering of components in 
a manner that balances optimizing source localization with 
reducing computational time. In this second approach, the 
resulting model will be different from the one obtained by 
thresholding of significant components. The model is then 
described as a sum of Kronecker products of spatial compo- 
nent matrices of incomplete rank and their corresponding 
temporal covariances. The difference is that there is no 
requirement of having rank one component matrices and 
having only one spatial component matrix of a bigger rank 
as in J14t . 

It is to be determined which of these two approaches 
shows better localization performance. In future work we 
plan to answer this question and investigate which clustering 
algorithm to use for obtaining summands in the second 
approach. 
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